Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adding data specific p-value filters #788

Merged
merged 9 commits into from
Sep 30, 2024

Conversation

addramir
Copy link
Contributor

✨ Context

Data specific lead p-value CS filters.

πŸ›  What does this PR implement

πŸ™ˆ Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working size-S Step labels Sep 24, 2024
@addramir addramir marked this pull request as ready for review September 24, 2024 18:45
@@ -37,6 +37,10 @@ def __init__(
finngen_susie_finemapping_cs_summary_files=finngen_susie_finemapping_cs_summary_files,
)

finngen_finemapping_df = finngen_finemapping_df.validate_lead_pvalue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stylistic comment, and it is absolutely my preference, but I like to use as few variables as possible:

(
    # Reading Finngen finemapped dataset and convert it to study locus:
    FinnGenFinemapping.from_finngen_susie_finemapping(
        spark=session.spark,
        finngen_susie_finemapping_snp_files=finngen_susie_finemapping_snp_files,
        finngen_susie_finemapping_cs_summary_files=finngen_susie_finemapping_cs_summary_files,
    )
    # Flagging sub-significnat loci:
    .validate_lead_pvalue(
        pvalue_cutoff=FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold
    )
    # Write the output.
    .df.write.mode(session.write_mode).parquet(
        finngen_finemapping_out
    )
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please carefully check, I removed the variables but not sure

Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might missing something, but could not found the flagging of the in-house finemapped datasets. Also you mentioned there would be specific p-value cutoff for UKBPPP.

@@ -169,6 +175,7 @@ class FinngenFinemappingConfig(StepConfig):
_target_: str = (
"gentropy.finngen_finemapping_ingestion.FinnGenFinemappingIngestionStep"
)
finngen_finemapping_lead_pvalue_threshold: float = 1e-5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one thing I'm not sure about. You have added finngen_finemapping_lead_pvalue_threshold to the relevant config, and refer to as pvalue_cutoff=FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold in the step. However, finngen_finemapping_lead_pvalue_threshold it not an argument for FinnGenFinemappingIngestionStep. Is it OK? Would all parameters in the config passed to the step? Would that cause any problem? @project-defiant , what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, double-checked with @project-defiant and @d0choa and all parameters in the setepConfig classes needs to be parameters in the init function of the step.

EqtlCatalogueFinemapping.from_susie_results(processed_susie_df)
# Flagging sub-significnat loci:
.validate_lead_pvalue(
pvalue_cutoff=EqtlCatalogueConfig().eqtl_lead_pvalue_threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: EqtlCatalogueConfig().eqtl_lead_pvalue_threshold has to be passed as parameter of the step.

)
# Flagging sub-significnat loci:
.validate_lead_pvalue(
pvalue_cutoff=FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold has to be an init parameter of the step.

.filter_credible_set(credible_interval=CredibleInterval.IS99)
# Flagging sub-significnat loci:
.validate_lead_pvalue(
pvalue_cutoff=WindowBasedClumpingStepConfig().gwas_significance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any parameter that are defined in the StepConfigs needs to be defined as parameters of the init methods in the respective classes otherwise the step would fail. The reason is that the step object would be initialised with an unexpected parameter.

@addramir
Copy link
Contributor Author

Any parameter that are defined in the StepConfigs needs to be defined as parameters of the init methods in the respective classes otherwise the step would fail. The reason is that the step object would be initialised with an unexpected parameter.

Fixed. Please check

@DSuveges
Copy link
Contributor

Any parameter that are defined in the StepConfigs needs to be defined as parameters of the init methods in the respective classes otherwise the step would fail. The reason is that the step object would be initialised with an unexpected parameter.

Fixed. Please check

@addramir , sorry there's an other one here

Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR as it is now, makes it impossible to parametrise the p-value threshold of the PICS step. IF other steps are called as a stand-alone command, users can provide the custom p-value threshold. However this is not true for PICS step. The applied cutoff is defined by config (WindowBasedClumpingStepConfig().gwas_significance), which value cannot be overridden.

I think this might be fine for now, but we should follow the pattern of other steps where all parameters inside the steps are customisable.

@DSuveges DSuveges merged commit 8b253a5 into dev Sep 30, 2024
5 checks passed
@DSuveges DSuveges deleted the yt_adding_data_specific_fillters branch October 1, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size-S Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants